首页> 外文OA文献 >Markov Decision Models with Weighted Discounted Criteria
【2h】

Markov Decision Models with Weighted Discounted Criteria

机译:加权折扣准则的马尔可夫决策模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

We consider a discrete time Markov Decision Process with infinite horizon. The criterion to be maximized is the sum of a number of standard discounted rewards, each with a different discount factor. Situations in which such criteria arise include modeling investments, modeling projects of different durations and systems with different time-scales, and some axiomatic formulations of multi-attribute preference theory. We show that for this criterion for some positive e there need not exist an e - optimal (randomized) stationary strategy, even when the state and action sets are finite. However, e - optimal Markov (non-randomized) strategies and optimal Markov strategies exist under weak conditions. We exhibit e - optimal Markov strategies which are stationary from some time onward. When both state and action spaces are finite, there exists an optimal Markov strategy with this property. We provide an explicit algorithm for the computation of such strategies.
机译:我们考虑了具有无限视野的离散时间马尔可夫决策过程。要最大化的标准是许多标准折扣奖励的总和,每个标准折扣奖励都有不同的折扣系数。产生此类标准的情况包括:对投资进行建模,对具有不同时标的不同持续时间和系统的项目进行建模,以及多属性偏好理论的一些公理表述。我们表明,对于该准则,对于某些正e,即使状态和动作集是有限的,也不必存在e-最优(随机)平稳策略。但是,在弱条件下存在e-最优马尔可夫(非随机)策略和最优Markov策略。我们展示了e-最优马尔可夫策略,该策略从某个时候开始就保持不变。当状态空间和动作空间都是有限的时,存在具有此属性的最优马尔可夫策略。我们提供了用于计算此类策略的显式算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号